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PIXEL REORDERING AND SELECTION LOGIC 

RELATED APPLICATIONS 

[0001] This application claims priority to U.S. Patent 
Application, Serial No. 60/495,301,, entitled ''PIXEL 
REORDERING LOGIC FOR MULTIPLE FORMATS IN A FEEDER", filed 
August 14, 2003, by Hatti, et . al . , which is incorporated 
herein by reference. 

FEDERALLY SPoisTSORED RESEARCH OR DEVELOPMENT 
[0002] [Not Applicable] 

[MICROFICHE/COPYRIGHT REFERENCE] 
[0003] [Not Applicable] 

BACKGROUND OF THE INVENTION 

[0004] A video decoder receives encoded video data and 
decodes and/or decompresses the video data. The decoded 
video data comprises a series of pictures. A display device 
displays the pictures. The pictures comprise a two- 
dimensional grid of pixels. The display device displays the 
pixels of each frame in real time at a constant rate. In 
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contrast, the rate of decoding can vary considerably for 
different video data. Accordingly, the video decoder writes 
the decoded pictures in a frame buffer. 

[0005] Among other things, a display engine is 
synchronized with the display device and provides the 
appropriate pixels to the display device for display. The 
display engine provides the appropriate pixels from the 
frame buffer to the display device. The location of the 
appropriate pixels in the frame buffer is dependent on the 
manner that the video decoder writes the pictures to the 
frame buffer. 

[0006] Characteristics that characterize the manner that 
the video decoder writes the picture to the frame buffer 
include the packing of luma and chroma pixels, the 
linearity that the frame is stored, and the spatial 
relationship between the luma and chroma pixels. The 
foregoing characteristics are usually determined by the 
original format of the source video data. 

[0007] The luma and chroma pixels of a picture can 
either be stored together or separately. The chroma pixels 
include chroma red difference pixels Cr, and chroma blue 
difference pixels Cb. In macroblock format, the luma Y 
pixels are stored in one array, while both chroma pixels 
Cr/Cb are stored together in another array. In planar 
format, the luitia pixels Y are stored in one array, the 
chroma Cr pixels are stored in a second array, and the 
chroma Cb pixels are stored in a third array. In packed YUV 
format, the luma pixels and both the chroma Cr/Cb pixels 
are stored together in a single array. 

[0008] In the packed YUV format, each alternating luma Y 
pixel is CO- located with chroma pixels Cr&Cb in horizontal 
direction. A picture in the packed YUV format can be 
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divided into units of four pixels, each of the units 
capable of being stored in a 32 -bit word. The four pixels 
comprise adjacent luma Y pixels and the chroma pixels Cr/Cb 
co-located with one of the luma Y pixels. The luma Y pixels 
and the chroma pixels Cr/Cb can be packed in any one of 
several pixel orders. Examples of pixel orders that the 
luma Y pixels and chroma pixels Cr/Cb can be packed 
include , Cbo/ Yo/Cro/Yi , Cro/Yo/Cbo/Yi , Yo/Cbo/Yi/Cro , and 
Yo/Cro/Yi/Cbo . Additionally, in big endian order, the four 
bytes are stored in a 32 -bit dword as 
byte0/bytel/byte2/byte3 . In little endian order, the four 
bytes are stored as byte3/byte2/bytel/byte0 . Whether bytes 
are stored in big endian byte order or little endian byte 
order depends on the hardware characteristics of the frame 
buffer memory. 

[0009] The video decoder does not necessarily store the 
picture in a linear manner. In planar and packed YUV 
formats, the video decoder stores pictures in linear format 
i.e., left to right and top to bottom order in the memory. 
However, in MPEG, DV2 5, and TM5 , pictures are stored in the 
frame buffer in a macroblock format. In the macroblock 
format, the pixels of the picture are divided into two 
dimensional blocks. The video decoder stores the two 
dimensional blocks in consecutive memory locations. 
[0010]_ Additionally, the spatial relationship of chroma 
pixels to luma pixels can differ among the many standards. 
Standards defining the spatial relationship of the chroma 
pixels to luma pixels include MPEG 4:2:0, MPEG 4:2:2, DV-25 
4:2:0, and DV-2 5 4:1:1 to name a few. Where the standards 
for the display and the decoded video data differ, chroma 
pixels for the display can be interpolated from two or more 
chroma pixels in the decoded video data. The standard for 
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the decoded video data is heavily dependent on the format 
of the source video data. 

[0011] ^ Conventionally, after each horizontal 

synchronization pulse, the host processor calculates the 
address of the , first pixels of a line and the parameters 
for chroma format conversion. The host processor then 
programs the display engine with the foregoing. 
[0012] Programming the display engine at each horizontal 
synchronization pulse consumes considerable bandwidth from 
the host processor. 

[0013] Further limitations and disadvantages of 
conventional and traditional approaches will become 
apparent to one of skill in the art, through comparison of 
such systems with embodiments presented in the remainder of 
the present application with references to the drawings. 



4 



BRIEF SUMM/U^Y OF THE INVENTION 

[0014] Presented herein is a line address computer for 
calculating the line addresses of decoded video data. 

[0015] In one embodiment, there is presented a method 
for displaying ^ pictures . The method comprises fetching a 
portion of a picture stored in a frame buffer, the portion 
of the picture stored with a byte order, storing the 
portion of the picture in another buffer with the byte 
order, fetching a plurality of pixels from the portion of 
the picture, and converting the byte order of the plurality 
of pixels to a predetermined byte order, wherein the byte 
order is different from the predetermined byte order. 

[0016] In another embodiment, there is presented a 
system for displaying pictures. The system comprises a 
first circuit, a buffer, a state machine, and a second 
circuit. The first circuit fetches a portion of a picture 
stored in a frame buffer, the portion of the picture 
stored with a byte order. The buffer stores the portion of 
the picture with the byte order. The state machine fetches 
a plurality of pixels from the portion of the picture. The 
second circuit converts the byte order of the plurality of 
pixels to a predetermined byte order, wherein the byte 
order is different from the predetermined byte order. 

[0017] In another embodiment, there is presented a 
method for" displaying pictures. The method comprises 
fetching a portion of a picture stored in a frame buffer, 
the portion of the picture stored with a pixel order, 
storing the portion of the picture in another buffer with 
the pixel order, fetching a plurality of pixels from the 
portion of the picture, converting the pixel order of the 
plurality of pixels to a predetermined pixel order. 
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[0018] In another embodiment, there is presented a 
system for displaying pictures. The system comprises a 
first circuit, a buffer, an input data write unit, and a 
second circuit. The first circuit fetches a portion of a 
picture stored in a frame buffer, the portion of the 
picture stored with a pixel order. The buffer stores the 
portion of the picture with the pixel order. The input 
data write unit fetches a plurality of pixels from the 
portion of the picture. The second circuit converts the 
pixel order of the plurality of pixels to a predetermined 
pixel order. 

[0019] In another embodiment, there is presented a 
method for displaying pictures. The method comprises 
fetching a portion of a picture stored in a frame buffer, 
storing the portion of the picture in another buffer, 
fetching a plurality of pixels from the portion of the 
picture, storing luma pixels in a luma pixel register, 
wherein the plurality of pixels comprise luma pixels, and 
storing chroma pixels in a chroma pixel register, wherein 
the plurality of pixels comprise chroma pixels. 
[0020] In another embodiment, there is presented a 
system for displaying pictures. The system comprises a 
first circuit, a buffer, a state machine, a luma pixel 
register, and a chroma pixel register. The first circuit 
fetches a portion of a picture stored in a frame buffer. 
The buffer stores the portion of the picture. The state 
machine fetches a plurality of pixels from the portion of 
the picture. The luma pixel register stores .luma pixels, 
wherein the plurality of pixels comprise luma pixels. The 
chroma pixel register stores chroma pixels, wherein the 
plurality of pixels comprise chroma pixels . 
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[0021] These and other advantages and novel features of 
the present invention, as well as details of an illustrated 
embodiment thereof, will be more fully understood from the 
following description and drawings. 
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BRIEF DESCRIPTION OF SEVERAL VIEWS OF THE DRAWINGS 
[0022] FIGURE 1 is block diagram of an exemplary decoder 
system in accordance with' an embodiment of the present 

invention; 

[0023] FIGURE 2 is a block diagram of an exemplary 
frame ; 

[0024] FIGURE 3A is a block diagram of a frame buffer 
storing a frame in accordance with the MPEG, DV25 and TM5 
formats ; 

[0025] FIGURE SB is a block diagram of a frame buffer 
storing a frame in accordance with the packed YUV format; 

[0026] FIGURE 3C is a block diagram of a frame buffer 
storing a frame in accordance with the planar format; 

[0027] FIGURE 4A is a block diagram of an exemplary 
gword storing packed YUV data in the big endian byte order; 

[0028] FIGURE 4B is a block diagram of an exemplary 
gword storing packed YUV data in the little endian byte 
order; 

[0029] FIGURE 5 is a block diagram of an exemplary gword 
storing MPEG/DV-2 5/TM5 pixels in the big endian byte order; 
[0030] FIGURE 6 is a block diagram of an exemplary 
display engine in accordance with an embodiment of the 
present invent ion ; 

[0031] FIGURE 7 is a block diagram of a pixel feeder in 
accordance with an embodiment of the pres"ent invention; 
[0032] FIGURE 8 is a block diagram of the pixel feeder 
in accordance with an embodiment of the present invention; 
[0033] FIGURE 9 is a block diagram of an endian, swizzle 
in accordance with an embodiment of the present invention; 
and 
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[0034] FIGURE 10 is a block diagram of pixel select 
logic in accordance with an embodiment of the present 
invention. 



DETAILED DESCRIPTION OF THE INVENTION 
[0035] Referring now to FIGURE 1, there is illustrated a 
block diagram of an exemplary decoder system for decoding 
compressed video data, configured in accordance with an 
embodiment of the present invention. A processor, that may 
include a CPU 90, reads transport stream 65 into a 
transport stream buffer 32 within an SDRAM 30. 
[0036] The data is output from the transport stream 
buffer 32 and is then passed to a data transport processor 
35. The data transport processor 35 then demultiplexes the 
transport stream 65 into constituent transport streams. The 
constituent packetized elementary stream can include for 
example, video transport streams, and audio transport 
streams. The data transport processor 3 5 passes an audio 
transport stream to an audio decoder 60 and a video 
transport stream to a video transport processor 40. 
[0037] The video transport processor 40 converts the 
video transport stream into a video elementary stream and 
provides the video elementary stream to a video decoder 45. 
The video decoder 45 decodes the video elementary stream, 
resulting in a sequence of decoded video frames. The 
decoding can include decompressing the video elementary 
stream. It is noted that there are various standards for 
compressing the amount of data required for transportation 
and storage of video data, such as MPEG- 2. 

[0038] The decoded video data includes a series of 
frames. The frames are stored in a frame' buffer 48. The 
frame buffer 48 can be dynamic random access memory (DRAM) 
comprising 128 bit/16 byte gigantic words (gwords) . It is 
also noted that in certain standards, such as MPEG-2, the 
order that frames are decoded is not necessarily the order 
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that frames are presented. Accordingly, several pictures 
can be stored in the frame buffer 48 at a given time, 
[0039] The display engine 50 is responsible for 
providing a bit stream to a display device, such as a 
monitor or a television. A display device displays the 
pictures in a specific predetermined display format with 
highly synchronized timing. The format dictates the order 
that different portions of a picture are displayed, as well 
as the positions of pixels. 

[0040] Referring now to FIGURE 2, there is illustrated a 
block diagram describing an exemplary picture ICQ. The 
picture 100 comprises any number of horizontal rows 
100 (0)...100 (N) . Each row 100 ( 0 ) ...100 (N) includes a row of 
luma Y pixels, Yo...Yx, and half as many chroma Cr pixels 
Cr6...Cr (x-i)/2 and half as many chroma Cb pixels Cbo...Cb{x-i)/2 - In 
a standard definition television picture 100, there are 480 
rows (N=479) , each comprising 720 luma Y pixels, 360 chroma 
Cr pixels, and 360 chroma Cb pixels. 

[0041] The luma Y, chroma Cr, and chroma Cb pixels can 
be stored in one of several array formats. For example, in 
the packed YUV format, the luma Y, chroma Cr, and chroma Cb 
pixels are stored together in one array in linear format. 
In the planar format, the luma pixels, chroma Cr pixels, 
and chroma Cb pixels are each stored in separate arrays in 
linear format. In MPEG, DV25, and TM5, the luma pixels Y 
are stored in one array, while the chroma Cr and chroma Cb 
pixels are stored together in another array in macroblock 
format . 

[0042] Referring now to FIGURE 3A, there is illustrated 
a block diagram describing the frame buffer storing the 
picture 100 in accordance with an array format for the 
MPEG, DV25 and TM5 formats. The frame buffer 48 comprises 
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two arrays 48Y, 48C of 16 byte/128 bit gwords 48Y(0), 
48Y(1), 48Y(2),... , and 48C(0), 48C(1), 48C(2),.... The pixels 
luma pixels Y are stored in array 4 8Y. The chroma Cr and Cb 
pixels are stored in array 48C. The gwords 48Y(0), 48Y(1),... 
each store 16 horizontally adjacent luma pixels, Yisi-Yisi+is . 
Each gword in array 48Y is associated with a gword in array 
48C, wherein the associated gword in array 48C stores the 
chroma Cr and chroma Cb pixels co- located with the luma 
pixels Yi6i...Yi6i+i5 - 

[0043] Referring now to FIGURE 3B, there is illustrated 
a block diagram describing the frame buffer 4 8 storing 

picture 100 in accordance with the packed YUV array format. 
The frame buffer 48 comprises 16 byte/128 bit gwords 48(0), 
48(1), 48(2),... . The pixels Yo...Yx, Cro...Cr (x-d /2 in each row of 
the frame 100 ( 0 ) ,..100 (N) are divided into units of four*"^^ 
pixels Uo...U(x-i)/2 . Each unit Ui comprises two luma pixels Yai -.^ 
and Y2i+i, and the chroma Cri pixels and chroma Cbi pixels - 
CO- located with luma pixels Y2i. The units U of each row ., 
100(0)...100(N) are stored from left to right Uo...U (x-i)/2 inv 
consecutive four byte memory portions. The gwords 48(0),/ 
48(1),... can store four units U4i, U4i+i, \J^i+2i U4i+3, therein. ' 
The four pixels Y2i, Y2i+i, Cri, Cbi can be stored into four 
bytes in one of pixel orders, including, Cbi Y2i Cri Y2i+i, 
Cri Y2i Cbi Y2i+i/ Y2i Cri Y2i+iCbi, and Y2i Cbi Y2i+iCri- 
[0044]- Referring now to FIGURE ' 3 C, there is illustrated 
a block diagram describing the frame buffer 4 8 storing 
picture 100 in accordance with the planar array format. The 
frame buffer 48 comprises three arrays 48Y, 48CR, 48CB of 
16 byte/128 bit gwords 48Y(0), 48Y(1), 48Y(2),... , and 
48C(0), 48C(1), 48C(2),.... The pixels luma pixels Y are 
stored in array 48Y. The chroma Cr are stored in array 
4 SCR. The chroma Cb pixels are stored in array 48CB. The 
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gwords 48Y(0), 48Y(1),... each store 16 horizontally adjacent 
luma pixels, Yisi-Yigi+is . Each gword in array 48Y is 
associated with a gword half in array 48CR, and a gword 
half in array 48CB, wherein the associated gword half in 
array 4 SCR and array 4 8CB store the chroma Cr and chroma Cb 
pixels co-located with the luma pixels Yisi-Yigi+is , 
[0045] The pixels can either be written in the bigendian 
byte order, byte0,bytel/byte2,byte3 or the little endian 
byte order byte3 , byte2 , bytel , byteO • 

[0046] Referring now to FIGURE 4A, there is illustrated 
a block diagram of an exemplary gword 48 (i) storing data in 
the big endian byte order. The gword 48 (i) comprises 128 
bits, bo...bi27. In the big endian byte order, bytes are stored 
starting from bits bo...b7. The units U4i, U4i+i, U4i+2, U4i+3 are 
stored in bits bo-bsi, b32...b63, b64...b95/ b96...bi27/ respectively. 
Additionally, the first, second, third, and fourth pixel of 
unit U4i are stored in bits bo...b7, be. ..bis, bi6...b23, are b24...b3i, 
respectively. If the pixels of units U^i, U4i+i, U4i+2, U4i+3 
are in the pixel order Cb,Yo,Cr,Yi, the chroma Cb pixels in 
units U4i, U4i+i, U4i+2, U4i+3 are stored in bits bo...b7, b32...b39, 
b64...b7i , and b96...bio3, respectively. The first luma pixels 

(that is co-located with the chroma Cr and Cb pixels) Yq of 
units U4i, U4i+i, U4i+2, U4i+3 are stored in bits bs-bis, b4o...b47, 
b72...b79, and bio4...biii, respectively. The chroma Cb pixels in 
units U4i, U4i+i, U4i+2, U4i+3 are stored in bits bi6...b23, b48...b55, 
b8o-b87# and bii2...bii9, respectively. The second luma pixels 

(that is co-located with the chroma Cr and Cb pixels) Yi of 
units U4i, U4i+i, U4i+2/ U4i+3 are stored in bits b24...b3i, 
b56-b63/ b88...b95/ and bi2o..-bi27# respectively. 

[0047] Referring now to FIGURE 4B, there is illustrated 
a block diagram of an exemplary gword 48 (i) storing data in 
the little endian byte order. The gword 48 (i) comprises 128 
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bits, bi27".bo. In the little endian byte order, bytes are 
stored starting from bits bi27...bi2o- The units U4i, U4i+i, U^i+2i 
U4i+3 are stored in bits bi27...b96/ b95...b64, b63...b32, bsi-bo, 
respectively. Additionally, the first, second, third, and 
fourth pixel of unit U4i are stored in bits bi27...bi20/ 
bii9...bii2, biii...bio4, are bios-bge, respectively. If the pixels 
of units U4i, U4i+i, U4i+2, U4i+3 are in the pixel order 
Cb,Yo,Cr,Yi, the chroma Cb pixels in units U4i, U4i+i, U4i+2/ 
U4i+3 are stored in bits bi27..'bi20/ bgE—bas/ bes—bsg , and b3i...b24/ 
respectively. The first luma pixels (that is co-located 
with the chroma Cr and Cb pixels) Yq of units U4i, U4i+i, 
U4i+2/ U4i+3 are stored in bits bii9...bii2, b87...b80/ b55...b48 / and 
b23...bi6/ respectively. The chroma Cb pixels in units U4i, 
U4i+i, U4i+2/ U4i+3 are stored in bits biii...bio4i b79...b72, b47,..b40/ 
and bi5...b8, respectively. The second luma pixels (that is 
co-located with the chroma Cr and Cb pixels) Yi of units 
U4i/ U4i+i, U4i+2/ U4i+3 are stored in bits bio3— bge, b7i...b64/ 
b39...b32, and b7...bo, respectively. 

[0048] From the foregoing, it can be seen that the 32- 
bits storing a unit U are different. Additionally, in big 
endian, the lowest order bits store the first pixel while 
in little endian, the highest order bits store the first 
pixel . 

[0049] Referring now to FIGURE 5, there is illustrated a 
block diagram of an exemplary gword 48 (i) storing data in 
the big endian byte order. The gword 48 (i) comprises 128 
bits, bo...bi27. In big endian order, bytes are stored 
starting from bits bo-.-b? . For pixels Yiei.-.Yigi+is, the pixel 
Yi6i is stored in bits bo—b?, The pixel Yisi+i is stored in 
bits b8...bi5, the pixel Yi6i+2 is stored in bits bi6...b23, the 
pixel Yi6i+3 is stored in bits b24...b3i, and the pixel Yisi+is is 
stored in bits bi2o...bi27. For pixels Cr/Cb8i...Cr/Cb8i+7/ the 
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pixel Crsi is stored in bits bo-.b?/ pixel Cbai is stored in 
bits b8...bi5, pixel Crsi+i is stored in bits bi6...b23/ pixel 
Cbei+i is stored in bits b24-b3i/ pixel Cr8i+7 is stored in 
bits bii2...bii9, pixel Cbsi+v is stored in bits bi2o...bi27. 
[0050] From the foregoing, it can be seen that the bits 
storing pixels are different. In the big endian byte order, 
the lowest order bits store the first pixel while in little 
endian byte order, the highest order bits store the first 
pixel . 

[0051] The display device is usually separate from the 
decoder system. The display device displays the frames with 
highly synchronized timing. Each row 100 (0 ) ...100 (N) is 
displayed at a particular time interval. The display engine 
50 provides the pixels to the display device for display, 
via the video encoder. The display device and the display 
engine 50 are synchronized by means of a vertical 
synchronization pulses and horizontal synchronization 
pulses. When the display device begins displaying a new 
frame 100 or field, the display device transmits a vertical 
synchronization pulse. Each time the display device begins 
displaying a new line 100 (x), the display device sends a 
horizontal synchronization pulse. The display engine 50 
uses the horizontal and vertical synchronization pulses to 
provide a stream comprising the pixels at a time related to 
the time for display. 

[0052] The display engine 50 generates the bitstream 
from the decoded frames stored in the frame buffers 48. To 
generate the bitstream of the pixels for display on the 
display device, the display engine 50 fetches the pixels 
from the frame buffer 48. However, the decoded pictures may 
be progressive while the display device is interlaced. 
Additionally, the decoded picture may have chroma pixels in 
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different positions from the display format. Additionally, 
the pixels of the decoded frame may be stored in a variety 
of different ways. For example, the chroma pixels can 
either be stored separately or with the luma pixels. 

[0053] Where the decoded frame has a different chroma 
format from the display format, the chroma pixels for the 
chroma pixel positions in the display format are 
interpolated from the chroma format of the decoded frame. 

[0054] Referring now to FIGURE 6, there is illustrated a 
block diagram of the display engine 50 in accordance with 
an embodiment of the present invention. The display engine 
50 includes a scalar 705, a compositor 710, a feeder 715, 
and a deinterlacing filter 720. The feeder 715 provides a 
bitstream of the pixels in the order the pixels are 
displayed for the display device. The bitstream comprises 
chroma pixels in the chroma pixel positions of the display 
format . 

[0055] Referring now to FIGURE 7, there is illustrated a 

block diagram describing an exemplary feeder 715 in 
accordance with an embodiment of the present invention. The 
feeder 715 provides a bitstream comprising pixels for 
display on the display device. The bitstream provides the 
pixels for display on the display device at a time related 
to the time the pixels are to be displayed by the display 
device. .Additionally, the bitstream comprises chroma pixels 
in the chroma pixel positions in accordance with the 
display format. After each horizontal synchronization 
pulse, a row 100 (x) is presented to the display device 65 
for display. 

[0056] After each vertical synchronization pulse, the 
host processor 90 programs the feeder 715 with the 
addresses of the frame buffer memory locations storing the 
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first luma pixels, the first chroma pixel (s) for display 
(i.e., the left most pixels in row 100(0)), and the format 
of the decoded frame. 

[0057] The foregoing parameters are provided to the 
feeder 715 via the REUS interface 805. After providing the 
parameters to the REUS interface 805, the host 90 sets a 
start parameter in the REUS interface 805. 

[0058] The REUS interface 805 provides the initial 
starting luma and chroma addresses to the ERM 815. When the 
ERM 815 receives the starting luma and chroma addresses, 
the start parameter in the REUS interface 805 is 
deasserted. The BRM 815 issues the commands for fetching 
the luma and chroma pixels in the first line of the 
frame/field. The IDWU 820 effectuates the commands. 
[0059] The BRM 815 includes a command state machine 815a 
and horizontal address computation logic 815b. The command 
state machine 815a can issue commands to the IDWU 820 
causing the feeder 715 to fetch pixels from the frame 
buffer at a memory address provided by the command state 
machine 815a, The command state machine initially commands 
the IDWU 82 0 to fetch the pixels starting at the starting 
luma and chroma addresses. The horizontal computation logic 
815b maintains the address of the frame buffer 4 8 location 
storing the next pixels in the display order. 
[0060J The IDWU 820 wr-ites the fetched pixels to- a 
double buffer 840 until the double buffer 840 is full. 
After the double buffer 840 is full, the double buffer 
machine detects when half of the data in the double buffer 
84 0, is consumed. Responsive thereto, the command state 
machine 815a commands the IDWU 820 to fetch the next pixels 
in the display order, starting at the address calculated by 
the horizontal address computation logic 815b, until the 
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double buffer 840 is full. The foregoing continues for each 
pixel in the first line 100(0). 

[0061] A line address computer 810 calculates the 
address of the> memory locations storing the starting pixels 
of the next line, e.g., line 100(1) if a progressive 
display or line 100(2) if an interlaced display. The BRM 
815 causes the IDWU 820 to start fetching pixels form the 
provided starting address. For each horizontal 
synchronization pulse, the line address computer 810 
provides the address of the memory locations storing the 
first pixel (leftmost) of a row of luma pixels. The line 
address computer 810 provides the address storing the first 
pixel of consecutive rows of luma pixels 100(0), 
100 (1) 100 (N) if the display is progressive. The line 
address computer 810 provides the address storing the first 
pixel of alternating rows of luma pixels 100(0), 
100 (2) 100 (N-1) 100 (1) , ' 100 (3). ..100 (N) if the display 
device 65 is interlaced. The line address computer 810 is 
described in more detail in U.S. Patent Application Serial 

No. , filed November 7, 2 003, by Hatti, 

et. al. (Attorney Docket No. 15139US02) , which is 
incorporated herein by reference. 

[0062] Additionally, as noted above, the feeder 715 
interpolates chroma pixels for the chroma pixel positions 
in the display picture- from the pixels in the decoded 
picture. 

[0063] At each horizontal synchronization pulse, the 
line address computer 810 provides interpolation weights, 
WCbT/ WCbB, WCrT, and WCrs for interpolation to a chroma 
filter- The interpolation weights depend on the decoded 
frame format, the display format, and the specific row with 
the chroma pixel positions. 
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[0064] A pixel feeder 835 comprises an endian swizzle & 
pixel select logic 835a, a chroma filter data path 835b, a 
chroma line buffer 835c, an output data path 835d, fixed 
color generation logic 835e, and a double buffer read state 
machine 835f. The double buffer state machine 835f performs 
various duties that manage the pixel feeder 835. The duties 
include maintaining the double-buffer 840 status, reading 
pixels from the double buffer 840, sequencing the chroma 
filter datapath 835b, and loading pixels onto the FIFO 830. 

[0065] The pixels are fetched from the frame buffer and 
stored in the double buffer 84 0 in the same byte order, 
pixel order and array format that the pixels were stored in 
the frame buffer 48. The double buffer read state machine 
835f creates a rasterized data stream from the luma pixel 
data as well as associated chroma pixel bitstream(s) . The 
luma pixel data stream and the chroma pixel bitstream(s) 
are synchronized with respect to each other, such that the 
luma pixels in the stream at a particular time and the 
chroma pixels in the stream (s) at a particular time are 
either co-located, or the pixels for interpolating the 
chroma pixels at chroma pixel positions co- located with the 
luma pixels. 

[0066] Referring now to FIGURE 8, there is illustrated a 
block diagram of the pixel feeder 835 in accordance with an 
embodiment of the present invention. The pixel feeder 835 
includes a data path comprising the endian swizzle 835a(l), 

pixel select logic 835a(2), a 32-bit luma pixel register 
905Y, a 16-bit chroma Cr pixel register 905R, and a 16-bit 
chroma Cb pixel register 905B. 

[0067] The chroma Cr pixel register 905R and the chroma 
Cb pixel register 905B provide chroma Cr and chroma Cb 
pixels to the vertical chroma filter 835bv. The vertical 
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chroma filter 835bv interpolates chroma pixels for the 
display format in the vertical direction. The output of the 
vertical chroma filter 835bv is provided to the horizontal 
chroma filter 835bh. The horizontal chroma filter 835bh 
interpolates chroma pixels for the display format . in the 
horizontal direction. 

[0068] A FIFO 83 0 receives the luma bitstream from the 
luma pixel register 905Y and a bitstream of interpolated 
chroma pixels. The FIFO 83 0 also receives signals from a 
bus protocol' generator 825 to prepare the luma bitstream 
and interpolated chroma bitstream for transmission over a 
bus . 

[0069] The double buffer state machine 835f creates the 
bitstream of chroma and luma pixels by fetching chroma and 
luma pixels from the double buffer 840 at regular time 
intervals for the pixel registers 905. As noted above, the 
pixels are fetched from the frame buffer and stored in the 
double buffer 840 in the same byte order, pixel order and 
array format. The double buffer state machine 835f fetches 
four pixels per double buffer 840 access. Because the 
pixels are stored in the double buffer 840 in the same byte 
order, pixel order and array format as stored in the frame 
buffer 48, the four pixels accessed during each access can 
include different types of pixels. 

[0070.] In the case of the packed YliV format, the pixel 
registers 905 are filled every two double buffer 840 
accesses. One unit U is accessed during each access. Each 
unit U comprises two luma Y pixels, a chroma pixel Cr, and 
a chroma pixel Cb. The luma pixel register 905Y receives 
the four luma pixels Y, the chroma Cr pixel register 905R 
receives the two chroma pixels Cr, and the chroma Cb pixel 
register 905B receives the two chroma pixels Cb. 
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[0071] In the case of the iyiPEG/DV-25/TM5 formats, four 
luma pixels Y are fetched in one double buffer 840 access 
and provided to the luma pixel register 905Y, In the next 
double buffer 84 0 access, the two chroma Cr and the two 
chroma Cb pixels associated with the four luma pixels are 
fetched and provided to the chroma Cr pixel register 905R 
and chroma Cb pixel register 905B, respectively. 

[0072] Additionally, either the big endian or little 
endian byte order can be used for storing the pixels in the 
double buffer 840. Therefore, the position of each 
particular pixel within the four bytes depends on whether 
the big endian or little endian byte order is used. For 
consistent handling, either the big endian byte order or 
the little endian order is chosen. Bytes of pixel data in 
the different or opposite byte order chosen can be 
reordered. The endian swizzle 835a (1) reverses the ordering 
of the pixels from the double buffer 840 • from either little 
endian to big endian, or big endian to little endian, when 
the byte order of the pixels is different or opposite the 
byte order chosen, 

[0073] Because each double buffer 840 access can include 
a variety of different pixels therein, the pixel select 
logic 835a (2) directs the pixels to the appropriate pixel 
registers 905 . 

[0074] Referring now to FIGURE 9, there is illustrated a 
block diagram of the endian swizzle 835a (1) in accordance 
with an embodiment of the present invention. The endian 
swizzle 835a (1) receives the four pixels/32 -bit access from 
the double buffer 84 0. The 32 -bit access is demultiplexed 
into four bytes Bo, Bi, B2, and B3, each byte corresponding 
to a pixel. The endian swizzle 835a (1) includes four 
multiplexers 1005(0), 1005(1), 1005(2), and 1005(3). 
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[0075] If a different or opposite byte ordering is used 
for the pixels, then the byte order chosen, Bo in the 
original byte order corresponds to B3 of the chosen byte 
order. Bi in the little endian order corresponds to B2 of 
the chosen byte order. B2 in the little endian order 
corresponds to Bi of the chosen byte order. B3 in the little 
endian order corresponds to Bo of the chosen byte order. 
[0076] Accordingly, multiplexers 1005(0) and 1005(3) 
receive bytes Bo and B3 . Multiplexers 1005(1) and 1005 (2) 
receive bytes Bi and B2 . If the original byte order is 
different or opposite the chosen byte order, bytes Bq and B3 
are swapped and bytes Bi and B2 are swapped. Multiplexer 
1005(0) selects byte B3, multiplexer 1005(1) selects byte 
B2, multiplexer 1005(2) selects byte Bi, and multiplexer 
1005(3) selects byte Bq. The outputs of the multiplexers 
1005 are multiplexed to result in the 32 -bit access 
converted to the big-endian byte order, e.g., B3,B2,Bi,Bo. If 
the original byte order is the same as the chosen byte 
order, the byte ordering is maintained. Multiplexer 1005(3) 
selects byte B3, multiplexer 1005(2) selects byte B2, 
multiplexer 1005(1) selects byte Bi, and multiplexer 1005(0) 
selects byte Bq- The outputs of the multiplexers 1005 are 
multiplexed to result in the original 32-bit access, e.g.. 
Bo, Bi, 82,63. The multiplexers 1005 are controlled by a signal 
Byte^In_DW_endian_Sel indicating whether a different or 
opposite byte order is originally used (1 indicates used, 0 
indicates not used, for example) provided by the double 
buffer read state machine 835f to effectuate the foregoing. 
[0077] Referring now to FIGURE 10, there is illustrated 
a block diagram describing an exemplary pixel select logic 
835a (2) in accordance with an embodiment of the present 
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invention. The pixel select logic 835a (2) comprises YUV 
reordering logic 1100 and selection logic 1200. 

[0078] The pixel select logic 835a (2) receives the 
output bsi.-.bo from the endian swizzle 835a (1). Three data 
paths provide the output b3i...bo from the endian swizzle 
835a (1) to the selection logic - the luma pixel path 1255, 
the chroma pixel path 1260, and the packed YUV path 1265. 
The packed YUV path includes a YUV repacking logic 1100. 

[0079] As noted above, where the frame 100 is stored in 
the packed YUV array format, the double buffer read state 
machine 835f accesses one unit U per access. The unit U 
comprises two luma pixels, a chroma pixel Cr, and a chroma 
pixel Cb. However, the pixel order within the unit U can 
vary. 

[0080] Accordingly, the YUV reordering logic 1100 
demultiplexes b3i...bo into four bytes, b3i...b24/ b23-bi6, bis-.-bs, 
and b7...bo- Each of the four bytes, b3i...b24/ b23...bi6, bi5...b8, and 
b7...bo, are provided to multiplexers 1205(0), 1205 (1), 
1205(2), 1205(3). Each multiplexer 1205 is configured to 
reorder pixels from a particular packed YUV format pixel 
order, to Ysi, Yzi+i, Cbi, Cri . 

[0081] For example, multiplexer 1205(0) changes the 
packed YUV pixel order Cbi, Y2i/ Cri, Y2i+i to Y2i, Y2i+i, Cbi, Cri . 
Accordingly, the multiplexer 1205(0) reorders the bytes 
b3i...b.24, b23."bi6, bi5...b8/ and b7...bo, as b23...bi6, b7...bo, b3i...b24i 
bi5...b8 . 

[0082] Multiplexer 1205(1) changes the packed YUV pixel 
order format Cri, Y2i, Cbi, Y2i+i to Y2i, Y2i+i, Cbi, Cri . 
Accordingly, the multiplexer 1205(1) reorders the bytes 
b3i..,b24# b23-bi6/ bis-bg, and b7...bo/ as b23...bi6/ b7...bo, bi5...b8, 
b3i...b24 • 
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[0083] Multiplexer 1205(2) changes the packed YUV pixel 
order ¥31, Cbi, ¥21+1, Cri to Y2i, Y2i+i,Cbi,Cri- Accordingly, the 
multiplexer 1205 (2) reorders the bytes b3i...b24/ b23...bi6/ 
bis-bs, and b7...bo, as b3i...b24, bis-ba, b23...bi6, b7...bo. 
[0084] Multiplexer 1205(3) changes the packed YUV pixel 
order Y2i/ Cri, Y2i+i, Cbi to ¥21, ¥21+1, Cbi, Cri. Accordingly, the 
multiplexer 1205 (3) reorders the bytes b3i...b24, b23...bi6, 
bis-bs/ and b7...bo, as b3i...b24/ bis-bs, b7...bo, b23-bi6. 
[0085] The another multiplexer 1210 receives the outputs 
of the multiplexers 1205 and selects the multiplexer 1205 
corresponding to the packed YUV pixel order of the fetched 
pixels. The double buffer read engine 835f provides a 
signal, PackedYUV_DW_Type_Sel indicating the packed YUV 
format pixel order of the pixels in the frame buffer /double 
buffer 840 (0 => Cbi, Y2i, Cri, Ysi+i, 1 => Cri, Yji, Cbi, Y2i+i, 
2 => Y2i, Cbi,. Y2i+i, Cri, 3 => Y2i, Cri, Ysi+i. Cbi) to the 
multiplexer 1210. The signal PackedYUV_DW_Type_Sel , causes 
the multiplexer 1205 to select the multiplexer 1205 
associated with the indicated packed YUV pixel order. The 
output of multiplexer 1210 is then demultiplexed to 
separate the two luma pixels Y2i/Y2i+i, the chroma pixel Cbi 
and the chroma pixel Cri. 

[0086] The selection logic 1200 receives pixels via the 
luma path 1255, the chroma path 1260, and the packed YUV 
path 12.6.5 -. The. signal. _ on the. luma path 1255 is 
demultiplexed into two 16 -bit components, b3i...bi6, and bis. ..bo. 
The signal on the chroma path 1260 is demultiplexed into 
four 8 -bit components, b3i,..b24/ b23-bi6/ bis.-.be, and b7...bo . The 
selection logic comprises six multiplexers 1205Y(1), 
1205Y(0), 1205B(1), 1205B(0), 1205R(1), and 1205(0). The 
luma pixel register 905Y receives a 16 -bit output b3i...bi6 
output from multiplexer 1205Y(1) and a 16-bit output from 
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multiplexer 1205Y(0) bi5...bo- The chroma Cb pixel register 
905B receives an 8 -bit output bis.-.bs from multiplexer 
1205B(1) and an 8-bit output from multiplexer 1205B(0). The 

chroma Cb pixel register 905R receives an 8 -bit output 
bi5...b8 from multiplexer 1205R(1) and an 8-bit output from 
multiplexer 1205R(0) . 

[0087] The multiplexer 1205Y(1) receives the luma pixels 
Y2i/Y2i+i from the packed YUV path 1260 and bits b3i.,.bi6 from 
the luitia path 1255. Multiplexer 1205Y(0) receives the luma 
pixels Y2i,Y2i+i f rom the packed YUV path 1260 and bits bis. ..bo 
from the luma path 1255. 

[0088] The multiplexer 1205B(1) receives a chroma pixel 
Cbi from the packed YUV path 12 60 and bits b3i...b24 from the 
chroma path 1265. The multiplexer 1205B(0) receives a 
chroma pixel Cbi from the packed YUV path 1260 and bits 
b23...bi6 from the chroma path 1265. 

[0089] The multiplexer 1205R(1) receives a chroma pixel 
Cri from the packed YUV path 1260 and bits bis-bs from the 
chroma path 1265. The multiplexer 1205B(0) receives a 
chroma pixel Cbi from the packed YUV path 1260 and bits 
b7...bo from the chroma path 1265. 

[0090] Each of the multiplexers 1205 are controlled by a 
signal Packed^YUV provided by the double buffer read state 
machine- 835f. When the picture 100 is in MPEG/DV-25/TM5 
.format the luma path 12.5.5 and chroma, path 12 65- carry four 
luma pixels Y4i, Y4i+i, Y4i+2/ Y4i+3 during one double buffer 
840 access, followed by two chroma pixels Cb2i, Cb2i+i/ and 
two chroma pixels Cr2i, Cr2i+i, during the next double buffer 
840 access, in alternating fashion. The multiplexers 
1205Y(1) and 1205Y(0) select the respective portions of the 
luma path 1255. The multiplexers 1205B(1) 1205B(0), 
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1205R(1), and 1205R(0) select the respective portions of 
the chroma path 1265. 

[0091] When the picture ICQ is in the packed YUV array 
format, the packed YUV path 12 60 carries two luma pixels 
Y2i, Y2i+i, and chroma pixels Cbi, and Cri during each access. 
Each of the multiplexers 1205 selects the respective 
portions of the packed YUV path 1260, 

[0092] The pixel registers 905 load the outputs from the 
multiplexers 1205 connected thereto, responsive to a 
control signals 910 provided by the double buffer read 
state machine 83 5f. As noted above, when the frame 100 is 
stored in the array format for MPEG/DV-2 5/TM5 , double 
buffer 840 accesses provide either four luma pixels or two 
chroma Cr and two chroma Cb pixels, and in alternating 
fashion. 

[0093] Accordingly, when the double buffer 840 access 
provides four luma pixels, the control signals 910Y(1) , 
910Y(0) controlling the luma pixel register 905 is 
asserted, causing - the luma pixel register 905 to load the 
outputs of multiplexers 905Y(1), and 905Y(0). 

[0094] When the double buffer 84 0 access provides chroma 
pixels, the control signals 910B(1), 910B(0), 910R(1), and 
910R(0) controlling the chroma Cr pixel register 905R and 
the chroma Cb pixel register 905B are asserted, causing the 
chroma Cr. . pixel register, 9 0.5 R. and chroma - Cb -pixel register 
905B to load the outputs of multiplexers 905B(1), 905B(0) 
and multiplexers 905R(1), 905R(0). The foregoing results in 
pixel registers 905Y, 905B, and 905R to store four luma 
pixels, two chroma Cb pixels, and two chroma Cr pixels, 
respectively, after every two double buffer 840 accesses, 
wherein the chroma pixels are associated with the luma 
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pixels. For example, the chroma pixels can be co-located 
with the luma pixels in the picture 100. 

[0095] When the picture 100 is stored in the Packed YUV 
array format, double buffer 840 accesses provides two luma 
pixels, a chroma Cr and chroma Cb pixel. The control 
signals 910Y(1), 910B(1), and 910R(1) control a half of 
registers 905Y, 905B, and 905R storing the most significant 
bytes. The control signals 910Y(0), 910B(0), and 910R(0) 
control a half of registers 905Y, 905B, and 905R storing 
the least significant bytes. The control signals 910Y(1), 
910B(1), and 910R(1) are asserted in alternating fashion 
with control signals 910Y(0), 910B(0), and 910R(0) causing 
the pixel registers 905Y, 905B, and 905R to store four luma 
pixels, two chroma Cb pixels, and two chroma Cr pixels 
after every two double buffer 84 0 accesses, wherein the 
chroma pixels are associated with the luma pixels. For 
example, the chroma pixels are co- located with the luma 
pixels in the picture 100. 

[0096] One embodiment of the present invention may be 
implemented as a board level product, as a single chip, 
application specific integrated circuit (ASIC) , or with 
varying levels integrated on a single chip with other 
portions of the system as separate components. 
[0097] The degree of integration of the system will 
primarily be. _ determined by speed and cost considerations. 
Because of the sophisticated nature of modern processors, 
it is possible to utilize a commercially available 
processor, which may be implemented external to an ASIC 
implementation of the present system. 

[0098] Alternatively, if the processor is available as 
an ASIC core or logic block, then the commercially 
available processor can be implemented as part of an ASIC 



27 



device with various functions implemented as firmware. 
[0099] While the invention has been described with 
reference to certain embodiments, it will be understood by 
those skilled in the art that various changes may be made 
and equivalents may be substituted without departing from 
the scope of the invention. In addition, many modifications 
may be made to adapt particular situation or material to 
the teachings of the invention without departing from its 
scope . 

[00100] Therefore, it is intended that the invention not 
be limited to the particular embodiment (s) disclosed, but 
that the invention will include all embodiments falling 
within the scope of the appended claims. 
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